Experiments on Element and Document Statistics for XML Retrieval based on tree matching

نویسندگان

Mohamed Ben Aouicha

Mohamed Tmar

Mohand Boughanem

Mohamed Abid

چکیده

This paper presents an information retrieval model on XML documents based on tree matching. Queries and documents are represented by extended trees. An extended tree is built starting from the original tree, with additional weighted virtual links between each node and its indirect descendants allowing to directly reach each descendant. Therefore only one level separates between each node and its indirect descendants. This allows to compare the user query and the document with flexibility and with respect to the structural constraints of the query. The content of each node is very important to decide weither a document element is relevant or not, thus the content should be taken into account in the retrieval process. We separate between the structure-based and the contentbased retrieval processes. The content-based score of each node is commonly based on the well-known Tf × Idf criteria. In this paper, we compare between this criteria and another one we call Tf× Ief . The comparison is based on some experiments, this model is used for content only queries (CO) and content and structure ones (CAS). Some experiments have been undertaken into 2 datasets provided by INEX to show the effectiveness of our approach on one hand and those of both weighting functions on the other. Keywords—XML retrieval, INEX’04, INEX’05, Tf × Idf , Tf × Ief

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments on Element and Document Statistics for XML Retrieval

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Experiments on Element and Document Statistics for XML Retrieval based on tree matching

نویسندگان

چکیده

منابع مشابه

Experiments on Element and Document Statistics for XML Retrieval

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Improved Skips for Faster Postings List Intersection

Improved Skips for Faster Postings List Intersection

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

عنوان ژورنال:

اشتراک گذاری